15 research outputs found

    Partitions of R^n with Maximal Seclusion and their Applications to Reproducible Computation

    Get PDF
    We introduce and investigate a natural problem regarding unit cube tilings/partitions of Euclidean space and also consider broad generalizations of this problem. The problem fits well within a historical context of similar problems and also has applications to the study of reproducibility in randomized computation. Given kNk\in\mathbb{N} and ϵ(0,)\epsilon\in(0,\infty), we define a (k,ϵ)(k,\epsilon)-secluded unit cube partition of Rd\mathbb{R}^{d} to be a unit cube partition of Rd\mathbb{R}^{d} such that for every point pRd\vec{p}\in\R^d, the closed \ell_{\infty} ϵ\epsilon-ball around p\vec{p} intersects at most kk cubes. The problem is to construct such partitions for each dimension dd with the primary goal of minimizing kk and the secondary goal of maximizing ϵ\epsilon. We prove that for every dimension dNd\in\mathbb{N}, there is an explicit and efficiently computable (k,ϵ)(k,\epsilon)-secluded axis-aligned unit cube partition of Rd\mathbb{R}^d with k=d+1k=d+1 and ϵ=12d\epsilon=\frac{1}{2d}. We complement this construction by proving that for axis-aligned unit cube partitions, the value of k=d+1k=d+1 is the minimum possible, and when kk is minimized at k=d+1k=d+1, the value ϵ=12d\epsilon=\frac{1}{2d} is the maximum possible. This demonstrates that our constructions are the best possible. We also consider the much broader class of partitions in which every member has at most unit volume and show that k=d+1k=d+1 is still the minimum possible. We also show that for any reasonable kk (i.e. k2dk\leq 2^{d}), it must be that ϵlog4(k)d\epsilon\leq\frac{\log_{4}(k)}{d}. This demonstrates that when kk is minimized at k=d+1k=d+1, our unit cube constructions are optimal to within a logarithmic factor even for this broad class of partitions. In fact, they are even optimal in ϵ\epsilon up to a logarithmic factor when kk is allowed to be polynomial in dd. We extend the techniques used above to introduce and prove a variant of the KKM lemma, the Lebesgue covering theorem, and Sperner\u27s lemma on the cube which says that for every ϵ(0,12]\epsilon\in(0,\frac12], and every proper coloring of [0,1]d[0,1]^{d}, there is a translate of the \ell_{\infty} ϵ\epsilon-ball which contains points of least (1+23ϵ)d(1+\frac23\epsilon)^{d} different colors. Advisers: N. V. Vinodchandran & Jamie Radcliff

    Comparing Powers of Edge Ideals

    Get PDF
    Given a nontrivial homogeneous ideal Ik[x1,x2,,xd]I\subseteq k[x_1,x_2,\ldots,x_d], a problem of great recent interest has been the comparison of the rrth ordinary power of II and the mmth symbolic power I(m)I^{(m)}. This comparison has been undertaken directly via an exploration of which exponents mm and rr guarantee the subset containment I(m)IrI^{(m)}\subseteq I^r and asymptotically via a computation of the resurgence ρ(I)\rho(I), a number for which any m/r>ρ(I)m/r > \rho(I) guarantees I(m)IrI^{(m)}\subseteq I^r. Recently, a third quantity, the symbolic defect, was introduced; as ItI(t)I^t\subseteq I^{(t)}, the symbolic defect is the minimal number of generators required to add to ItI^t in order to get I(t)I^{(t)}. We consider these various means of comparison when II is the edge ideal of certain graphs by describing an ideal JJ for which I(t)=It+JI^{(t)} = I^t + J. When II is the edge ideal of an odd cycle, our description of the structure of I(t)I^{(t)} yields solutions to both the direct and asymptotic containment questions, as well as a partial computation of the sequence of symbolic defects.Comment: Version 2: Revised based on referee suggestions. Lemma 5.12 was added to clarify the proof of Theorem 5.13. To appear in the Journal of Algebra and its Applications. Version 1: 20 pages. This project was supported by Dordt College's undergraduate research program in summer 201

    List and Certificate Complexities in Replicable Learning

    Full text link
    We investigate replicable learning algorithms. Ideally, we would like to design algorithms that output the same canonical model over multiple runs, even when different runs observe a different set of samples from the unknown data distribution. In general, such a strong notion of replicability is not achievable. Thus we consider two feasible notions of replicability called list replicability and certificate replicability. Intuitively, these notions capture the degree of (non) replicability. We design algorithms for certain learning problems that are optimal in list and certificate complexity. We establish matching impossibility results

    Geometry of Rounding: Near Optimal Bounds and a New Neighborhood Sperner's Lemma

    Full text link
    A partition P\mathcal{P} of Rd\mathbb{R}^d is called a (k,ε)(k,\varepsilon)-secluded partition if, for every pRd\vec{p} \in \mathbb{R}^d, the ball B(ε,p)\overline{B}_{\infty}(\varepsilon, \vec{p}) intersects at most kk members of P\mathcal{P}. A goal in designing such secluded partitions is to minimize kk while making ε\varepsilon as large as possible. This partition problem has connections to a diverse range of topics, including deterministic rounding schemes, pseudodeterminism, replicability, as well as Sperner/KKM-type results. In this work, we establish near-optimal relationships between kk and ε\varepsilon. We show that, for any bounded measure partitions and for any d1d\geq 1, it must be that k(1+2ε)dk\geq(1+2\varepsilon)^d. Thus, when k=k(d)k=k(d) is restricted to poly(d){\rm poly}(d), it follows that ε=ε(d)O(lndd)\varepsilon=\varepsilon(d)\in O\left(\frac{\ln d}{d}\right). This bound is tight up to log factors, as it is known that there exist secluded partitions with k(d)=d+1k(d)=d+1 and ε(d)=12d\varepsilon(d)=\frac{1}{2d}. We also provide new constructions of secluded partitions that work for a broad spectrum of k(d)k(d) and ε(d)\varepsilon(d) parameters. Specifically, we prove that, for any f:NNf:\mathbb{N}\rightarrow\mathbb{N}, there is a secluded partition with k(d)=(f(d)+1)df(d)k(d)=(f(d)+1)^{\lceil\frac{d}{f(d)}\rceil} and ε(d)=12f(d)\varepsilon(d)=\frac{1}{2f(d)}. These new partitions are optimal up to O(logd)O(\log d) factors for various choices of k(d)k(d) and ε(d)\varepsilon(d). Based on the lower bound result, we establish a new neighborhood version of Sperner's lemma over hypercubes, which is of independent interest. In addition, we prove a no-free-lunch theorem about the limitations of rounding schemes in the context of pseudodeterministic/replicable algorithms

    Neighborhood Variants of the KKM Lemma, Lebesgue Covering Theorem, and Sperner's Lemma on the Cube

    Full text link
    We establish a "neighborhood" variant of the cubical KKM lemma and the Lebesgue covering theorem and deduce a discretized version which is a "neighborhood" variant of Sperner's lemma on the cube. The main result is the following: for any coloring of the unit dd-cube [0,1]d[0,1]^d in which points on opposite faces must be given different colors, and for any ε>0\varepsilon>0, there is an \ell_\infty ε\varepsilon-ball which contains points of at least (1+ε1+ε)d(1+\frac{\varepsilon}{1+\varepsilon})^d different colors, (so in particular, at least (1+23ε)d(1+\frac{2}{3}\varepsilon)^d different colors for all sensible ε(0,12]\varepsilon\in(0,\frac12]).Comment: 18 pages plus appendices (30 pages total), 3 figure

    Geometry of Rounding

    Full text link
    Rounding has proven to be a fundamental tool in theoretical computer science. By observing that rounding and partitioning of Rd\mathbb{R}^d are equivalent, we introduce the following natural partition problem which we call the {\em secluded hypercube partition problem}: Given kNk\in \mathbb{N} (ideally small) and ϵ>0\epsilon>0 (ideally large), is there a partition of Rd\mathbb{R}^d with unit hypercubes such that for every point pRdp \in \mathbb{R}^d, its closed ϵ\epsilon-neighborhood (in the \ell_{\infty} norm) intersects at most kk hypercubes? We undertake a comprehensive study of this partition problem. We prove that for every dNd\in \mathbb{N}, there is an explicit (and efficiently computable) hypercube partition of Rd\mathbb{R}^d with k=d+1k = d+1 and ϵ=12d\epsilon = \frac{1}{2d}. We complement this construction by proving that the value of k=d+1k=d+1 is the best possible (for any ϵ\epsilon) for a broad class of ``reasonable'' partitions including hypercube partitions. We also investigate the optimality of the parameter ϵ\epsilon and prove that any partition in this broad class that has k=d+1k=d+1, must have ϵ12d\epsilon\leq\frac{1}{2\sqrt{d}}. These bounds imply limitations of certain deterministic rounding schemes existing in the literature. Furthermore, this general bound is based on the currently known lower bounds for the dissection number of the cube, and improvements to this bound will yield improvements to our bounds. While our work is motivated by the desire to understand rounding algorithms, one of our main conceptual contributions is the introduction of the {\em secluded hypercube partition problem}, which fits well with a long history of investigations by mathematicians on various hypercube partitions/tilings of Euclidean space

    Epigenome Wide Association Study of SNP–CpG Interactions on Changes in Triglyceride Levels after Pharmaceutical Intervention: A GAW20 Analysis

    Get PDF
    In the search for an understanding of how genetic variation contributes to the heritability of common human disease, the potential role of epigenetic factors, such as methylation, is being explored with increasing frequency. Although standard analyses test for associations between methylation levels at individual cytosine-phosphateguanine (CpG) sites and phenotypes of interest, some investigators have begun testing for methylation and how methylation may modulate the effects of genetic polymorphisms on phenotypes. In our analysis, we used both a genome-wide and candidate gene approach to investigate potential single-nucleotide polymorphism (SNP)–CpG interactions on changes in triglyceride levels. Although we were able to identify numerous loci of interest when using an exploratory significance threshold, we did not identify any significant interactions using a strict genomewide significance threshold. We were also able to identify numerous loci using the candidate gene approach, in which we focused on 18 genes with prior evidence of association of triglyceride levels. In particular, we identified GALNT2 loci as containing potential CpG sites that moderate the impact of genetic polymorphisms on triglyceride levels. Further work is needed to provide clear guidance on analytic strategies for testing SNP–CpG interactions, although leveraging prior biological understanding may be needed to improve statistical power in data sets with smaller sample sizes

    Evaluating the Performance of Gene-Based Tests of Genetic Association when Testing for Association Between Methylation and Change in Triglyceride Levels at GAW20

    Get PDF
    Although methylation data continues to rise in popularity, much is still unknown about how to best analyze methylation data in genome-wide analysis contexts. Given continuing interest in gene-based tests for next-generation sequencing data, we evaluated the performance of novel gene-based test statistics on simulated data from GAW20. Our analysis suggests that most of the gene-based tests are detecting real signals and maintaining the Type I error rate. The minimum pvalue and threshold-based tests performed well compared to single-marker tests in many cases, especially when the number of variants was relatively large with few true causal variants in the set

    A Genome-Wide Association Study of Red-Blood Cell Fatty Acids and Ratios Incorporating Dietary Covariates: Framingham Heart Study Offspring Cohort

    Get PDF
    Recent analyses have suggested a strong heritable component to circulating fatty acid (FA) levels; however, only a limited number of genes have been identified which associate with FA levels. In order to expand upon a previous genome wide association study done on participants in the Framingham Heart Study Offspring Cohort and FA levels, we used data from 2,400 of these individuals for whom red blood cell FA profiles, dietary information and genotypes are available, and then conducted a genome-wide evaluation of potential genetic variants associated with 22 FAs and 15 FA ratios, after adjusting for relevant dietary covariates. Our analysis found nine previously identified loci associated with FA levels (FADS, ELOVL2, PCOLCE2, LPCAT3, AGPAT4, NTAN1/PDXDC1, PKD2L1, HBS1L/MYB and RAB3GAP1/MCM6), while identifying four novel loci. The latter include an association between variants in CALN1 (Chromosome 7) and eicosapentaenoic acid (EPA), DHRS4L2(Chromosome 14) and a FA ratio measuring delta-9-desaturase activity, as well as two loci associated with less well understood proteins. Thus, the inclusion of dietary covariates had a modest impact, helping to uncover four additional loci. While genome-wide association studies continue to uncover additional genes associated with circulating FA levels, much of the heritable risk is yet to be explained, suggesting the potential role of rare genetic variation, epistasis and gene-environment interactions on FA levels as well. Further studies are needed to continue to understand the complex genetic picture of FA metabolism and synthesis
    corecore